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1. AN EXTENDED THEORY OF KNOWLEDGE 



“Though it be allowed, that reason may form very plausible conjectures with regard to consequences of such 
a particular conduct in such particular circumstances; it is still supposed imperfect, without assistance of 
experience , which is alone able to give stability and certainty to the maxims , derived from study and 
reflection. ” 

— David Hume 



1.1 The u New” AI 

We argue that AI is moving into a new phase characterized by a broadened understanding of the nature of 
knowledge, and by the use of new computational paradigms. A sign of this transition is the growing 
interest in neurocomputers, optical computers, molecular computers and a new generation of massively 
parallel analog computers. In this section we outline the forces driving the development of this “new” AI. 
In the remainder of the paper we present the theory of field computers , which is intended to be a 
comprehensive framework for this new paradigm. 

The “old” AI has been quite successful in performing a number of difficult tasks, such as theorem prov- 
ing, chess playing, medical diagnosis and oil exploration. These are tasks that have traditionally required 
human intelligence and considerable specialized knowledge. On the other hand, there is another class of 
tasks in which the old AI has made slower progress, such as speech understanding, image understanding, 
and sensorimotor coordination. It is interesting that these tasks apparently require less intelligence and 
knowledge than do the tasks that have been successfully attacked. Indeed, most of these recalcitrant tasks 
are performed skillfully by animals endowed with much simpler nervous systems than our own. How is 
this possible? 

It is apparent that animals perform (at least some) cognitive tasks very differently from computers. 
Neurons are slow devices. The well-known “Hundred Step Rule” 1 says that there cannot be more than 
about a hundred sequential processing steps between sensory input and motor output. This suggests that 
nervous systems perform sensorimotor tasks by relatively shallow, but very wide (i.e., massively parallel) 
processing. Traditional AI technology depends on the digital computer’s ability to do very deep (millions 
of sequential operations), but narrow (1 to 100 processors) processing. Neurocomputing is an attempt to 
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obtain some of the advantages of the way animals do things by direct emulation of their nervous systems. 



iOC3 



1.2 Theoretical and Practical Knowledge 

One visible difference between the old and new AIs is in their computational strategies: the former stresses 
deep but narrow processing, that latter shallow but wide processing. Underlying this difference there is a 
deeper one: a difference in theories of knowledge. The old AI emphasizes propositional (verbalizable) 

knowledge . That is, it assumes that all knowledge can be represented by sentence-like constructs (i.e., 
finite ensembles of discrete symbols arranged in accord with definite syntactic rules). The propositional 
view is not new; it goes very far back, arguably to Pythagoras. Yet there is considerable evidence that 
nonpropositional knowledge is at least as important . 2 

The problems of practical action, as opposed to theoretical contemplation, are too complicated for pro- 
positional analysis. The real world is simply too messy for idealized theories to work. Representation in 
terms of discrete categories, and cognition by manipulation of discrete structures referring to these 
categories, may be appropriate to the idealized worlds of chess playing and theorem proving (although even 
this is doubtful ). However, in practical action the context looms large, as does the indefiniteness of 
categories and the other second order effects that propositional representation routinely idealizes away. 

Of course, the approximations of propositional representation can be improved by a deeper theoretical 
analysis, but this greatly increases the computational burden. Traditional AI is faced with a dilemma: 
simple theories do not enable skillful behavior, but detailed theories are computationally infeasible. There 
might seem to be no way to avoid this tradeoff. But, recalling the Hundred Step Rule, and observing that 
animals behave skillfully, we realize that there must be a third alternative. 

The limitations of traditional AI technology show us the limitations of theoretical knowledge, i.e., 
knowledge that. There is, however, another kind of knowledge, which we can call practical knowledge, or 
knowledge how. For example, a fish knows how to maintain its depth in the water, but it does not know 
that neutral buoyancy is achieved by adjusting its specific gravity to that of water. The fish does not have 
an explicit (propositional) theory of how temperature, dissolved substances, etc. affect the specific gravity 
of water, nor does it know equations describing the complex manner in which its specific gravity depends 
on the state of its body (food in gullet, air in air bladder, etc. etc.). Rather, the fish’s knowledge how is 
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represented nonpropositionally in its nervous system and body. 



1.5 The Acquisition of Knowledge 

The foregoing suggests that by discovering how to represent and manipulate practical knowledge the new 
AI may accomplish what the old could not. There are difficulties, however. How is practical knowledge 
acquired? There are several ways theoretical knowledge is acquired: for example, it may be taught. Since 
propositions can be encoded in verbal structures, language can be used to transfer theoretical knowledge 
from one person to another. (Of course, more detailed theories require correspondingly larger verbal struc- 
tures for their encoding.) Thus, in principle, the representation of theoretical knowledge in a computer is 
straight forward; we merely have to design an appropriate knowledge representation language. In effect 
theoretical knowledge is transferred from human to computer in the same way it is transferred from human 
to human. 

Before theoretical knowledge can be transferred it must be acquired in the first place. The original 
discovery of theoretical knowledge is beyond the scope of this paper. Here we restrict ourselves to the 
transfer of theoretical knowledge from one person to another; this is the case that is most important for 
expert systems and other applications of traditional AI technology. 

Since practical knowledge is nonpropositional, it cannot be encoded verbally. This does not mean it 
cannot be taught, however, since we can teach by showing as well as by saying. Therefore, although 
theoretical knowledge is transferred by telling, practical knowledge is transferred is by training. Indeed we 
often speak of “training” a neural network to accomplish some task. 

We have seen how practical knowledge may be transferred. How is it acquired in the first place? In a 
word, by adaptation. In nature adaptation occurs predominantly at two levels: at the species level it leads 
to innate practical knowledge; at the individual level it leads to learned practical knowledge. The forego- 
ing suggests that where the old AI depended on verbal encoding and transfer, the new AI will emphasize 
training and adaptation as means of knowledge acquisition. 
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2. FIELD TRANSFORMATION COMPUTERS 



2.1 Massive Parallelism 

The preceding section suggests that the new AI will augment the traditional deep, narrow computation 
with shallow, wide computation. That is, the new AI will exploit massive parallelism . Now, massive 
parallelism means different things to different people; massive parallelism may begin with a hundred, a 
thousand, or a million processors. On the other hand, biological evidence suggests that skillful behavior 
requires a very large number of processors, so many in fact that it is infeasible to treat them individually; 
they must be treated en masse . This has motivated us to propose 5 the following definition of massive 
parallelism: 

Definition (Massive Parallelism): A computational system is massively parallel if the number of pro- 
cessing elements is so large that it may conveniently be considered a continuous quantity. 

That is, a system is massively parallel if the processing elements can be considered a continuous mass 
rather than a discrete ensemble . 

How large a number is large enough to be considered a continuous quantity? That depends on the pur- 
pose at hand. A hundred is probably never large enough; a million is probably always large enough; a 
thousand or ten thousand may be enough. One of the determining factors will be whether the number is 
large enough to permit the application of continuous mathematics, which is generally more tractable than 
discrete mathematics. 

We propose this definition of massive parallelism for a number of reasons. First, as noted above, skill- 
ful behavior seems to require significant neural mass. Second, we are interested in computers, such as opti- 
cal computers and molecular computers, for which the number of processing elements is effectively continu- 
ous. Third, continuous mathematics is generally easier than discrete mathematics. And fourth, we want 
to encourage a new style of thinking about parallelism. Currently, we try to apply to parallel machines the 
thought habits we have acquired from thinking about sequential machines. This strategy works fairly well 
when the degree of parallelism is low, but it will not scale up. One cannot think individually about the 
10 20 processors of a molecular computer. Rather than postpone the inevitable, we think that we should 
begin now to develop a theoretical framework for understanding massively parallel computers. The 
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principal goal of this paper is to propose such a theory. 



2.2 Field Transformation 

Our aim then is to develop a way of looking at massive parallelism that encompasses a variety of imple- 
mentation technologies, including neural networks, optical computers, molecular computers and a new gen- 
eration of analog computers. What these all have in common is the ability to process in parallel amounts 
of data so massive as to be considered a continuous quantity. This suggests that we structure our theory 
around the idea of a field , i.e. a continuous (dense) ensemble of data. We have in mind both scalar fields 
(such as potential fields) and vector fields (such as gravitational fields). Any operation on such a field, 
either to produce another field or to produce a new state of the field, can be considered massively parallel, 
since it operates on all the elements of the field in parallel. Indeed, it would not be feasible to serialize the 
processing of the field; modest degrees of parallelism cannot cope with the large number of field elements. 

In the remainder of this paper we explore field transformation computers, that is, computers character- 
ized by the ability to perform (in parallel) transformations on scalar and vector fields. We are not suggest- 
ing that field computers are unable to perform scalar calculations; in fact we assume that field transforma- 
tion computers have the scalar capabilities of conventional digital and analog computers. Scalars have 
many uses in field computation. For example, we may want to use a scalar parameter to control the rate 
at which a field transformation takes place (e.g., a reaction rate in a molecular computer). Similarly, we 
may use a scalar representing the average intensity of a field to control the contrast enhancement of that 
field. A scalar threshold value may be used to suppress low level noise, and so forth. 

An important reason for combining field computation with conventional digital computation is that it 
permits knowing how to be combined with knowing that , leading to knowledgeable, skillful behavior. The 

O 

combined use of propositional and theoretical knowledge is unfortunately beyond the scope of this paper." 
2.2 Classes of Field Transformations 

Field transformations, like filters, can be divided into two classes: nonrecursive and recursive. A nonre- 
cursive transformation is simply a functional composition of more elementary transformations. The output 
of a nonrecursive transformation depends only on its input. A recursive transformation involves some kind 
of feedback. Hence, its output depends both on its input and on its prior state. Recursive transformations 
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are ideal for simulating the temporal behavior of a physical system, for example in simulated annealing 4 
and Boltzmann machines . 5 

2.4 General Purpose Field Computers 

Many field computers are designed for special purposes; this has been the case with field computers to date, 
and we expect it to be the case in the future. In these computers, devices implementing field transforma- 
tions (such as filters and convolutions) are assembled to solve a small class of problems (e.g., pattern recog- 
nition). On the other hand, our experience with digital computation has shown us the value of general 
purpose or programmable computers. This architectural feature permits one computer to perform a variety 
of digital computations, which eliminates the need to construct special purpose devices, and speeds imple- 
mentation of digital algorithms. 

The foregoing observations suggest that general purpose field computers will be similarly valuable. In 
these the connections between field transformation units and field storage units axe programmable, thus 
facilitating their reconnection for a variety of purposes. In fact, we may want to make better use of our 
resources by multiplexing the use of field transformation units under the control of a program. Thus, a pro- 
gram for a general purpose field computer might look very much like a conventional program, except that 
the basic operations are field transformations rather than scalar arithmetic. 

We cannot build into a general purpose field computer every transformation we might need. Instead we 
must choose a set of primitive operations that permit the programming of all others. How can such a set 
of primitive operations be chosen? How can we be guaranteed that we have provided ail the necessary 
facilities? For digital computers this question is answered in part by computability theory. For example, 
this theory shows us how to construct a universal Turing machine , which, given an appropriate program, 
can emulate any Turing machine. Although the universal Turing machine is hardly a practical general 
purpose computer, consideration of it and other universal machines shows us the kinds of facilities a com- 
puter must have in order to be universal. There follows the hard engineering job of going from the theoret- 
ically sufficient architecture to the practically necessary architecture. 

Can the same be accomplished for field computers? Is there a universal field computer that can emulate 
any field computer? If there is such a thing, then we can expect that it may form a basis for practical gen- 
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eral purpose field computers in much the same way that Turing machines do for digital computers. In the 
next section we prove that general purpose field computation is possible. 

3. A UNIVERSAL FIELD COMPUTER 
3*1 Introduction 

In this section we develop the general theory of field computation and prove the existence of a universal 
field computer. In particular, we show that with a certain set of built in field transformations we can 
implement (to a desired degree of accuracy) any field transformation in a very wide class. This is analo- 
gous to the result from Turing machine theory: The universal Turing machine allows us to implement (to 
a desired degree of accuracy) any function in a wide class (now known as the computable functions). 

The phrase ‘to a desired degree of accuracy’ appears in both of the preceding statements. What does it 
mean? For the Turing machine it means that a given accuracy (e.g., precision or range of argument) can 
be achieved by providing a long enough tape. For the digital computer it means that computations are 
normally performed to a given precision (e.g., the word length), and that finite increments in the desired 
precision require finite increments in the resources required (e.g., additional registers and memory cells for 
double and multiple precision results, or stack space for recursion). The case is much the same for the 
universal field computer. Finite increments in the desired accuracy of a field transformation will require 
finite increments in the resources used (such as field transformation and storage units). 

There are a number of theoretical bases for a universal field computer. We have investigated designs 
based on Fourier analysis, interpolation theory and Taylor’s theorem, all generalized for field transforma- 
tions. In this paper we present the design based on Taylor’s theorem. There are no doubt as many princi- 
ples upon which universal field computers can be based as there are bases for universal digital computers. 

3.2 Taylor Series Approximation of Field Transforms 

In this section we develop the basic theory of functions on scalar and vector fields and of their approxima- 
tion by Taylor series. Most of the definitions and theorems in sections 3.2 and 3.3 have been previously 

O 

published 0 ; they are reproduced here for completeness. Once it is understood that fields are treated as 
continuous-dimensional vectors, it will seen that the mathematics is essentially that of finite-dimensional 
vectors. Note that the treatment here is heuristic rather than rigorous. First we consider scalar fields; 
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later we turn to vector fields. 



As usual we take a scalar field to be a function <p from an underlying set f l to an algebraic field /T, thus 
<jr. 0 — ► K. For our purposes K will be the field of real numbers, 1R. We use the notation $(0) for the 
set of all scalar fields over the underlying set ft (K = 1R being understood). Thus, $(ft) is a function 
space, and in fact a linear space under the following definitions of field sum and scalar product: 

{<}> + ii>) t = 4> t + rp t 

(■ \4)t = H<t>t) U) 

Note that we often write <f> t for </>(t) } the value of the field at the point t . As a basis for this linear space 
we take the unit functions u t for each t 6 ft. They are defined 

UJ'(t) = 1 

u> t (s) = 0, if (2) 



Note that 




(3) 



The preceding definitions show that we can think of scalar fields as vectors over the set fi. Since we want 
to be quite general, we assume only that ft is a measurable space. In practice, it will usually be a closed 
and bounded subspace of E n , n-dimensional Euclidean space. Thus we typically have one, two and three 
dimensional closed and bounded scalar fields. 



Since ft is a measure space, we can define an inner product between scalar fields: 






• = f n dt. 



(4) 



We also define the norm: 

Ml = / n l«^l df- 

Jn (5) 

Thus $(ft) is the function space Zr 1 (ft). Note that the u t are not an orthogonal set under this norm, since 

\\w t \\ = 0 . 
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We first consider scalar valued functions of scalar fields, that is functions /: <&(fl) — ► IR. We prove 



some basic properties of these functions, culminating in Taylor’s theorem. 



Definition (Differentiability): Suppose / is a scalar valued function of scalar fields, /: $(fi) IR, and 
that / is defined on a neighborhood of <p € $(f2). Then we say that / is differentiable at <j> if there is a 
field G € $(f2) such that for all a in this neighborhood 



/(0 + a) - f(4>) = a-G+rj ||a 



( 6 ) 



where rf — > - 0 as | |a| | — > 0. We will later call G the gradient of / at <j>. 



Theorem: If / is differentiable at <j> then / is continuous at <f>. 



Proof: Since / is differentiable at <j> we know 



f{4 0 - /(0) = (0 - <t>)-G + »7||0 - 011- 



Therefore, 



/(0)"/(0) I = | (0-0)‘G+»7||0-0||| 

^ 1(0 — 4>)-G\ + | n\ 110 - 011 
< ||G|| 110 - 0|| + 1 n\ ||0 - 011 
= (IIGII +UI)|0-0l|. 



Thus / is continuous at <j>. a 



The quantity a • G whose existence is guaranteed by differentiability is called the directional deriva- 
tive of / with respect to Ct at <j>. It is defined directly as follows. 



Definition (Directional Derivative): The directional derivative in the “direction” a is given by the 
following limit: 

vJM -!*■(*)- Iim ^ + *“> ~ M 

oa h^o a (7) 

We use V a / an< I interchangeably for the directional derivative. Note that if / is differentiable at (j) 

then V«/(0) =«• <?• 
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Lemma: If / is differentiable in a neighborhood of <j), then 



• 4 ~ + xa ) 



+ xa). 



( 8 ) 



Proof : By the definition of the derivative: 

± f{<f> + xa) = Um [\± ± - ti±±lA 

dx h 

= /(<ft + xa + ha) - /(<£ + xa) 

A-o h 

= V Q /(^ + xa) 



The last step follows by the definition of the directional derivative. * 

Theorem (Mean Value): Suppose /: — ► IR is continuous on a neighborhood containing (j> and ip . 

Then, there is a 9, 0 ^ 9 ^ 1, such that 

fW - = {$ - <t>) - v/(x) 

where x = $ + - <t>) ' ( 9 ) 



Proof: 1 Let a = ip — (f) and consider the function 

F{x) = f[<t> + xa) - f(<t>) - x[f{ii>) - /(<£)]. 

Since / is continuous, so is F. Now, since F( 0) = F(l) = 0, we have by Rolle’s Theorem that there is a 
9, 0 ^ 9 ^ 1, such that F' ( 9 ) = 0. Note that 

F'tz) - 



By the preceding lemma 

F '{x ) = sj a f{4> + xa) - if{rp) - f{4)\ 



-r- xa) - f{4>) - x\f{ii>) - f(4 >)\ j 



— /(^ + xa) - [/(*/>) - }{<p)\ . 
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Hence, substituting 6 for z, 



0 = F'{9) = \7 a f(<f> + 9a) - [/( 0 ) - /(*)]. 



Therefore, transposing we have 

i 

i 



fW ~ f{4) = V a f{4> + 9a) 



and thejtheorem is proved, m 

i 

f 

i 

j 

Theorem (Taylor): Suppose that / and all its directional derivatives through order n + l are continuous 
in a neighborhood of <t>. Then for ail a such that (f) + a is in that neighborhood there is a 0 6 ^ 1, 

t 

i 

such thit 



/(i + <*) = f{4>) + v a f{<f>) + + 



~jVaf(<P) 

nl 



1 \7 n+1 f(<p 9a). 

(n+l)! ’ (10) 



Proof: By the Taylor theorem on real variables, 



| f(<p + ta) - f(<j>) + f(<f>)t + f{4>)t 2 

d£ L dr 



1 d -l /(«(" + 



nl d t n 



i H n+1 

f{4> + 9a)t 

n+l)! d ^ +1 



ft + 1 



Observe that by the preceding lemma 



d* 

dr 



f{</> + ta) = Va/(^ + ta )- 



Therefore, 



f{<p~ ] ta) = + V a /(?)f +|v a V(?+ - ••• +±^f(<p)t* +——v: +1 f(<? ^9a)t 

1 nl [n-^-ljl 



71+1 



Setting ' t = 1 gives the desired result. ■ 



The extension to a function of several scalar fields is routine. 

Since our “vectors” are continuous dimensional, partial derivatives are with respect to a “coordinate” 
t £ f2 rather than with respect to a coordinate variable. To define partial derivatives it’s convenient to 
make use of the Dirac delta functions, 6 t , for t G f2: 
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S t (t) = oo 

<5j(s) = 0, for s ^ t 



( 11 ) 



Of coarse, by the first equation above we mean 5,(s) = lim e 1 for | s— f| < e/2. Note the following pro- 

f -<0 

perties of the delta functions (fields): 



lk.ll - 1 

6 t • <f) = <f> t 

& dl 



(12) 



Given the delta functions the partial derivative of / at coordinate t is simply expressed: 



K- ■ ^ 



(13) 



Theorem: If / is differentiable at <j> then the first order partial derivatives exist at <f>. 
Proof: First observe that by differentiability 

f(<t> + hS t ) - f(<f>) _ f(4) + hS t • G + r,\\h6 t \\ - f(4) 
h h 



= S r G + n \\S t \\\h\/h 
= 6 t ■ G +t)\h[/h 
= G t + rj\ h\ / h 



Recalling that r) — ♦ 0 as h -*■ 0, observe 



lim 

ft — *-0 



f{4> + hs t ) - m 



lim | G t + rj\ h\ jh — G t 
h-+ o 



lim I Y) 
h->0 



0 



Hence, — -f(<f>) — G tl where G is the field whose existence is guaranteed by differentiability. Thus the 
Oo t 
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partial derivative exists. ■ 



What is the field G whose points are the partial derivatives? It is just the gradient of the function. 

Definition (Gradient): The gradient of /: $(f2) — ► IR at 6 is a field whose value at a point t is the par- 

0 

tial derivative at that point, — /(<^>) : 

oo t 



m*)) t = Tjgr/fa). 



(14) 



Since, by Eq. 3, V/(^) = I fc he gradient can also be expressed in terms of the basis func- 

j n 

tions and the partial derivatives: 



V/M * /n“>^W d ‘- 



(15) 



When no confusion will result, we use the following operator notations: 

V/ = f n w* df/dS t d t 

v - "I W dt (16) 

d/dS t = S t • V = V*. 

Finally, since V/(^) = the field guaranteed by differentiability, and \/ a f{(f) = a ’ G , we know 

(<*) = V«/(^) = a • V/M (17) 

or, in operator form, 3/ctek = \7 a = Ol * V- 

3.3 A Universal Field Computer Based on Taylor Series Approximation 

We can use Taylor’s Theorem to derive approximations of quite a general class of scalar valued functions 
of scalar fields. Thus, if we equip our universal field computer with the hardware necessary to compute 
Taylor series approximations, then we will be able to compute any of a wide class of functions (namely, 
those functions whose first n partial derivatives exist and are continuous). Therefore, consider the general 
form of an n-term Taylor series: 
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( 18 ) 



M * E ttV*/(^o) 5 



*=0 w 



where a - (j> — <f> 0 



What hardware is required? Clearly we will need a field subtractor for computing the difference field 
a = <j> — <^o* We w iU also nee d a scalar multiplier for scaling each term by 1 / k\; we will also need a scalar 
adder for adding the terms together. The harder problem is to find a way to compute Va/(^o) f° r a vector 
a that depends on the (unknown) input <p . The trouble is that the as and the V s are interleaved, as can 
be seen here: 



V*/(^o) = (<* • v) k f{4> o) 

= (a • v)*" 1 [a • V/(^o)] 

* (“ ' v)‘-‘ a j-fiA) dii 

* 1 



= (*•••/ j a t a t 

Jn JnJn ‘i u - 



9* 

a ^d6 t dS tj ---dS tk 



f{4 > 0 ) dt x d t 2 ■ • • d t k 



We want to separate everything that depends on a, and is thus variable, from everything that depends on 
f{4> 0 ), and is thus fixed. This can be accomplished (albeit, with extravagant use of our dimensional 
resources) by means of an outer product operation. Therefore we define the outer product of two scalar 
fields: 



(<?> A = <t>At 



(19) 



Note that if <p, xb € <&(fl) then <p A V) € <&(fl 2 ). 

To see how the outer product allows the variable and fixed parts to be separated, consider first the case 



Va** 
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vlMc) - ^ ^ m <*' d* 

= faf n ( a A a )‘,t (V),(v)» /(*o) d * d5 

= /„/„(« A a), t/ (v A V).,i /(^o) d * d « 

= / n ,(a A a),(V A V)* d * /(*„) 

= (a A a) • (V A V) f(<f> 0 ) 

Now we can see how the general case goes. First we define the &-fold outer product: 

0^ — 0 

0i* +1 ! = 0 A 0^ ( 20 ) 



Then, 



V*/W = <* [k] ■ V k] f{^>) 



( 21 ) 



The n-term Taylor series then becomes 



m * t 4 - W - ^o) 1 * 1 -V 1 * 1 /!*) 

A=0 /c ‘ 



( 22 ) 



Since 0 O is fixed, we can compute each V 1 /(0 o) onc e, when the field computer is programmed. Then, for 
any given input 0 we can compute (0 — 0 O )^ an d ta ^ e t ^ ie i nner product of this with V^*'/(0o)- Thus, in 
addition to the components mentioned above, computing the Taylor series approximation also requires 
outer and inner product units that will accommodate spaces up to those in ^(H 71 ). 

We consider a very simple example of Taylor series approximation. Suppose we want to approximate 
defint 0, which computes the definite integral of 0, defint 0 = J^0 5 ds. First we determine its partial 
derivative at t by observing: 
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defint {<j> + hS t ) - defint (j> 

lim ; 

A-o h 



lim 

k-+0 




+ hS t [s) d s — jAii 

h 



[im j>, ds + */„*,(«) - J>, ds 

A-*o h 



= lim A||£,||//i 

A-*0 



= 1 



Thus, defint 6 = 1, and we can see that 

a S, 



V defint <j> = 1, 



where 1 is the constant 1 function, 1 ( = 1. This leads to a one term Taylor series, which is exact: 

defint cj> = 6 • 1 



(23) 



(24) 



Note that 1 is a fixed field that must be loaded into the computer. 

3.4 Transformations on Scalar and Vector Fields 

The previous results apply to scalar valued functions of scalar fields. These kinds of functions are useful 
(e.g., to compute the average value of a scalar field), but they do not exploit the full parallelism of a field 
computer. Achieving this requires the use of functions that accept a (scalar or vector) field as input, and 
return a field as output. We briefly sketch the theory for scalar field valued functions of scalar fields; 
transformations on vector fields are an easy extension of this. 

By a scalar field valued transformation of scalar fields we mean a function F: $(1^) — ► <£(no). Such a 
transformation is considered a family of scalar valued functions f t : ^(f7 1 ) — ► IR for each t G Ho; these are 
the component functions of F. Note that F can be expressed in terms of its components: 



FW - /„/,(*) w, it 

More briefly, F = f t u t d£. F is decomposed into its components by S t * F(<^) = 



(25) 



Next we turn to the differentiability of field transformations. To define this it is necessary to first 
define an analog to the inner product for fields of different dimension. Thus, we define the continuous- 
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dimensional analogue of a vector-matrix product: 



W). - - 9 ,-* 

<**). - 



where the transpose of a field is defined 'F ^ st . With this notation differentiability can be defined. 

Definition (Differentiability of Field Transformations): Suppose F: $(1^) — * 1S a 

valued function of scalar fields defined on a neighborhood of <j> € We say that F is differentiable at 

<j> provided there is a field T 6 ^f^xfio) such that for all a in the neighborhood 



F(<^> + a) -F(^) = aT + H||a|| 



where H 6 $(^o) an( ^ ||H|| — ► 0 as ||a|| 0. We will show that T is the gradient of F at <j>. 



( 27 ) 



Next we consider the directional derivative of a field transformation. For a scalar function /, V a /(^) 
is a scalar that describes how much f(<f>) changes when its argument is perturbed by a small amount in the 
“direction” a. For a field transformation F, Vq^(<^) should be a field, each component of which reflects 
how much the corresponding component of F(<^) changes when <}> moves in the “direction” a. That is, 
lV a F(^)] ( = V a S t ■ ¥{</>). Hence, 



V a F(^) = f n u t y a S t • T(<t>) di, 

or, more briefly, V a F = u) t V Q S t ' F df. The directional derivative is defined directly by a limit. 



( 28 ) 



Definition (Directional Derivative of Field Transformations): If F: ^(f^) $(f2 2 ) is 

differentiable at d> then 



7.rw - f-W) - lim 

oa h->o h 



( 29 ) 



It is obvious that V a F(<^) = aT, where T is the field whose existence is guaranteed by differentiability. 

i 

This suggests the definition: 

Definition (Gradient of Field Transformation): The gradient of F at <j> is the field whose value at a 
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point t is the partial derivative of F at that point: 



[VF («% = 



( 30 ) 



Since S7 s T(<fr) — = F n it follows that \/F((p) = T, as expected. 



Notice that the gradient is of higher dimension than the field transformation. That is, if 
F: <£(£2^) —►■$(£ 12 ), t ^ len V^ : $(£2}) *£(£2} x £^c)* Higher order gradients will have similarly higher 

order: 



V*F: $(n x ) - $(!},* x n 2 ), for F: $(0,) - $(0,) 



( 31 ) 



A derivation similar to that for scalar field functions yields the Taylor series for field transformations: 

I'M » £ -j r I(d>-«>o)vl‘P(« 

*=0 Kl 



As before, outer products can be used to separate the variable and fixed components. 

We illustrate the Taylor series computation for a simple but important class of field transformations, 
the integral operators. For example, the derivative and difference transformations, which are very useful in 
image processing, are integral operators, as will be shown below. 



Definition (Integral Operator): A field transformation F: an integral operator if 

there is a field ^ 6 ^(£2o x £2]J, called its kernel, such that 



F(^) - ^<p 



( 33 ) 



Recall {$6) s = f n $ st <f> t dt. 

Since F(^ + ip) = + 0) = = F(<p) F(^), it’s clear that integral operators are 

linear. Thus, their gradients are especially easy to compute: 



Theorem (Gradient of Integral Operator): The gradient of an integral operator is the transpose of 
its kernel. 
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Proof: Suppose F(<^) = \&< f) is an integral operator. Observe F(^ + a) — F(0) = =• a\& T , where 

Hence yF(^) = ■ 



Since the gradient of a constant function is zero, all higher order gradients vanish, so the Taylor series for 
an integral operator is exact: F(<^) = = <Z >^ T . 

Next we show that the derivative and difference operations on scalar fields are integral operators and we 
compute their Taylor series. First consider the finite difference transformation A<p defined so that 



&<f> t ~ fit+k 



(34) 



for a given h > 0. To see that this is an integral operator observe: 

= s t+h '<? - 5 t - <t> = ( S l+h - S t ) • <j> 



(35) 



Define the field # by $ tu = — 8 t (u ) and it follows that A <j) = since 

(**)» = J n = L' s t + ki u ) - s ti u ) !^>u d « = i s t+h - s t) ' <t> , , 

The only trouble with this formula for the finite difference is that the field is not physically realizable, 
since it makes use of the Dirac functions. In practice we have to replace 8 t and 8 t by finite approxima- 
tions, but the resulting approximate difference transformation is still an integral operator. The same 
applies to the derivative transformation, which can be approximated by the approximations to the first 
and higher order derivatives. 



To further illustrate the Taylor series for scalar field valued transformations, we consider pointwise 
transformations. A pointwise transformation applies a scalar function (but not necessarily the same func- 
tion) to every element of a field. That is, F: <£(f2) -* <l>(f2) is a pointwise transformation if for some 

f t : JR - IR, 



|FMI« - M.) 



( 37 | 



Note that 



• FM - !,(*,)■ 
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( 38 ) 



Lemma: If F: $(fi) $(fl) is a poinfcwise transformation, then 



V* V W) = ** aF ' (<f>) 



( 39 ) 



where [F ' (^)] £ = an< * ft' i s ^e derivative of f t . 



Proof: By the definition of the derivative: 

r VW+H-VW 

V„ s t ■ Flo) = lim 

h~>Q fl 



= am 

A-o n 



= lim 



1 



= lim a t f t '{<f> t ) + e| ha t \ /h 

A- 0 



= <5 ( -aF'M 



Theorem (Directional Derivative of Pointwise Transformation): If F: — ► <&(f2) is a point 

wise transformation, then 



V a F(^) = aF'(j) 



( 40 ) 



Proof: Applying the preceding lemma: 
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V a F(<£) = f n “t V a 6, ■ F(4>) dt 



= f n ^ t &t ' '{$) dt 

= a f n uj t S t ■ T'[<p) dt 

= aF'tf) 

The last step follows by Eq. 12. ■ 

Corollary: The directional derivative of a pointwise transformation is a pointwise transformation. 

Proof: This follows immediately from the preceding theorem. ■ 

The theorem and its corollary lead to the Taylor series expansion of a pointwise transformation: 

Theorem (Taylor Series for Pointwise Transformation): If F: $(f2) — ► $(f2) is a pointwise 

transformation, and its component functions and all their derivatives through n-rl are continuous in a 
neighborhood of 0, then for all a such that 0-ba is in that neighborhood there is a field 9, 0 ^ 9 t ^ 1, 
such that 

*(* + «<)- s^«‘F ( ‘ , W+ 7 -rr!7“” +1 i' ( ” +1) W + «<») 

jt=o *' (n+1)! (41) 

Here F^ is the kth derivative of F : 

- /,“>(*,) - ^ - 
ax 



X = <Pt 



(42) 



/here f t — S t m Y . 



Piroof:. By the Taylor theorem for functions of reals: 



\<j> + a)}, - + a t ) 



(» + !)! 



- S^KF ( *>(«)], + ^_U r |a* + 1 Fl*« ) (^ + to)], 



Jfc = 0 
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This results permits converting Taylor series for scalar functions to the Taylor series for the corresponding 
pointwise field transformations. Notice also that the resulting series do not require outer products, pro- 
vided we )iave a field multiplier, (00), ~ 0,0,. 

We illustrate this theorem by determining a Taylor series approximation for In, the function that com- 
putes the natural logarithm of each field element. 



Theorem (Taylor Series for Pointwise Logarithm): Suppose (In 0), = In <2>, . Then 



In 6 = (0-1) — ^-(0 - l) 2 + -—(0 - l) 3 - • * * + 



(-D 



n— 1 



■(«>-!)• 



t T^TF ,n 11 + e( + ' 1)1 



(« 



provided | 0 f — 1 1 ^ 1 and 0, ^ 0, for all t 6 fl. 



Proof: Note that for k> 0, ln^ 0 = ( — 1)* l (k — l)! / 0*. Therefore, for £>1, 

ln^) 1 = ( — l) k ~ l [k — 1)!. By the Taylor theorem, 

In (1 + a) = In 1 + V -L<**ln(*) 1 - --■■■-■ a^lnWfl + Qa) 

■■ *-i*! (»+ 1)! 

• = ) k ~\k - 1)! + -_L-a* +1 lnW(l - 9a) 

k = l k - (n-t-1)! 

= V (t 1 . )*" 1 a k + — L_a n+1 ln^)(l + da) 

tx k (»+l )! 

To prove the theorem let a = 0 — 1. * 

We consider vector fields briefly. Recall that any three-dimensional vector field $ can be considered 
three scalar fields 0, 0, x where 



$t = </>A + 



( 44 ) 



Similarly, a function that returns a three-dimensional vector field can be broken down into three functions 
that return scalar fields. Thus, we see that a transformation on finite dimensional vector fields can be 
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implemented by a finite number of transformations on scalar fields. 

To ensure the continuity of field valued functions, certain restrictions must be placed on the fields per- 
mitted as arguments. Although these restrictions are still under investigation, we believe that it is 
sufficient that the input field’s gradient be bounded at each stage. This will be the case for all physically 
realizable fields. This restriction on allowable inputs finds its analogy in digital computers: legal input 
numbers are restricted to some range; numbers outside that range may cause underflow or overflow in the 
subsequent computation. In the same way here, fields whose gradients are too large may lead to incorrect 
results. 

4. EXAMPLE APPLICATION: BIDIRECTIONAL ASSOCIATIVE FIELD MEMORY 

In this section we illustrate the theory of field computation by analyzing a continuous version of Kosko’s 
bidirectional associative memory . 6 The system operates as follows (see Figure- 1). 

out x 




out* 

Figure 1. Bidirectional Associative Field Memory 

The goal is to store a number of associative pairs [xp^\ where <p^ 6 and 

(We assume these fields are bipolar, that is, take values in { — 1, -rl}, although physical realizability actu- 
ally requires continuous variation between —1 and +1.) Presentation of a field <j> £ at in 1 eventu- 

ally yields at outj and out 2 the pair j for which <f>^ is the closest match for <f>. Similarly, presentation of 
xf) at in 2 will yield the pair for which is the closest match. The pairs are stored in a distributed 
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fashion in the field x ^l) computed as follows: 



vf = xpW /\ • • • -f /\ 



(45) 



Note that tp^ A <t>^ reflects the cross correlations between the values of xp^ and the values of <p^ k \ 



Two of the boxes Fig. 1 perform “matrix- vector” multiplications, $<p or ipty. Thus presentation of a 
field <p at illj yields 



= S(ljj, ^(j>) 



( 46 ) 



at out}. Here S is a nonlinear function that helps to suppress crosstalk between stored pairs by forcing 
field values back to — 1 or -f-1 (it is described below). Computation of xp in turn yields 



r = sfar*) 



( 47 ) 



at out 2 - On the next iteration we get ip' ' = S{xp' , $ <p ') and <f> ' ' = S(<p\ and so forth. Each 

iteration will yield a closer approximation of the desired (tp^ k \ <p^ k ^). We cannot in general guarantee that 
the system will stabilize (i.e., that the <j> and xp will become constant), but we will show that the changes 
will become as small as we like. Kosko can show stability, since the discreteness of his fields places a lower 
bound on nonzero change. 



The nonlinear function S updates the output field in the following way (illustrated for the computation 

r = S(tP, *<t>) ]: 



^'t 



+1 if ^ t -4> > 

< ip t if -0 2 ^ $ t -</> ^ 9 X 
-l if < -0 2 



( 48 ) 



Thus, the value of a field element xp t is not changed if — 9n ^ * <p ^ where the thresholds # l7 9n > 0. 

The rule for <p' = 5(<^, xp \&) is analogous. 

Following Hopfield' we show that the stored pairs (xp^ k \ <p^ k ^) are stable states in the dynamic behavior 
of the memory. 

Theorem (Stability of Stored Pairs): Any stored pair is stable, provided | f2jJ > 9, where 



-25- 



9 — max (0 l5 9 2 ). (This condition holds for realistic values of the thresholds.) 



Proof: Suppose that ) is one of the stored pairs and observe: 

= (E ^ {k) A <t> w )<f> {j) 

k 

= £ (^ (fc) A <t> w )<t> {j) 

* 

= ^ 4 k] {<t> [k) • 4 j) ) 

k 



The expression (j>^ ' measures the correlation between (j)^ and <p^\ which varies between — | fi-J 
and 4*| f2 x | . Notice that for j^k this expression has mean value 0. On the other hand, when j=k its 
value is | fijj . Hence, 



* 0 (i) | fill 



(49) 



Now consider the crucial expression from Eq. 48: 

<n t -4 u) * W y) l n il 

If JpM = +1, then 

9 t -4>w « |n x | > e x 

and so ip' t = -i-L. Similarly, if ip\^ = — 1, then 

* -! fill < -0 2 

and so ip' t = — 1. In both cases ip' t = ip\^ ■ Since ip' = ip^ and (by similar reasoning) <t>' 
see that any stored pair (ip^\ ©^) is stable, a 



(50) 






we 



The preceding theorem shows that the stored pairs are stable, but it does not show that they will ever 
be reached. Therefore we prove several theorems establishing conditions under which correct recall takes 
place. The following theorem shows that if (j> is sufficiently close to one (j>^ and sufficiently far from all 
the others, then perfect recall occurs in one step. 



Theorem (Close Matches Lead to Correct Recall): rfr' = provided that 
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(51) 



— 0 > ^ <p^'(p 



where 6 = max (0j, 9 2 ). Similarly, <p' = provided ip^-rp — 9 > 2 tp^-<p. 

k±j 



Proof: For notationai convenience let <Tj. = <p^’(p be the similarity of and 6. Thus we are assuming 



<r j - 9 > 2 

**; 



(52) 



We need to show ip' = so consider 

^ iP {k) f\ <f> W )(p 

k 

= 2 A 

k 

= 2 • *) 
k 

= 2 ^ (A) 

* (53) 

This equation is plausible: it says that each 0^ contributes to the result to the extent the corresponding 
is similar to <j>. From Eq. 53 we have 

*« • <p = + E 

(54) 

Now we consider the cases = 4-1 and ip[^ ~ — 1. If = — I then 

+ E 

k^kj 




> 9 ^ 9 1 



Hence rp t ' = +1 = If = _ I then 

^ = - 0- ; + E 
*#/ 

< -°V + E «h 
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< -0 ^ - 0 2 

Hence xp t ' = — 1 = 0^. In either case xp t ' = 0|^. The proof that 0' = 0^ is analogous. ■ 

The preceding proof makes very pessimistic assumptions, namely, that the other 0^ are negatively 
correlated with 0^, and thus maximally interfering. In the more likely case, where they are uncorreiated, 
or even better, where closeness in the 0^ implies closeness in the we can get correct recall in spite of 

the other ;0^) being similar to 0. 



Our next goal is to show that the system converges to a stable state. To accomplish this, following 

j 

Kosko and Hopfield , we investigate how a Lyapunov function for the system changes in time. The func- 
tion is defined: 



£(0, (p) — — V 2 0*(*&0 — — V2 — 0o) 



(55) 



(We write 8- ambiguously for 0-1.) This formula can be understood by observing that 0**1/ 0 measures the 
correlation between 0 and ^0, and 0’0'f r measures the correlation between 0 and ipty . Thus E represents 
the ‘‘energy” of the system, which decreases as these correlations increase. Eq. 55 can be simplified to: 



£’(0, 0) = -0*1 >0 + 0-0 x 4- 0*0o 



(56) 



where we. write 0'&<2> = 0*^0 = 0^*0. We now use this energy function to prove some important results. 



Theorem (Monotonicity of Change of Energy): Changes of state always decrease the energy. That 
is, A£(0) < 0 and A£(0) < 0. 

i 

Proof: The change in energy resulting from alteration of xp to xp ' is: 

AE(xp) = E{xp' , <p) - E(xp, <t>) 

= (—xp''P<p + + 0-d 2 ) — {—xp'ip + + v'#*) 

= (-xp' + xp) ■ {^(p - 9 X ) 

A E(xP) = -A xp ■ ($<p - 6,) (57) 

The analysis for the change in (p is analogous. Expanding the inner product in Eq. 57 yields: 
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( 58 ) 



A E(t/>) = ejdt 



We next investigate the integrand. 

The updating equation for rb (Eq. 48) yields the following possibilities: 

l Axp t € {0, +2} if - 0, > 0 

< Atp t = o if -e 2 ^ ^ e x 

€ {0, -2} if 'Slf-d > -r 0 2 < 0 



(59) 



Since — 9 1 < V t -4> + 6 ^ < 0, note that in all three cases: 

Aip t (^f t -<f> - 0 ,) ^ 0 



(60) 



Furthermore, note that if A tp t # 0 then this inequality is strict. Observe that since A^» r 0, this strict 
inequality must hold for a set of t € 0 2 with nonzero measure. This implies the strict monotonicity of 

AE(if>): 



A E(i/>) = - f A - 0,) dt < 0 



(61) 



Hence, until the system stabilizes, every <f> ip step of the iteration decreases the energy. Analogous rea- 
soning shows that the energy also decreases with the tj) —► (f> steps. ■ 

We have shown that every step decreases the energy. We now show that the energy cannot decrease 
forever, because it has a lower bound. 



Theorem (Energy Bounded From Below): The energy is bounded from below by a quantity that 
depends only on v &. 

Proof: Recall (Eq. 56) that the energy is defined: <j>) = -|- tp'9 l + Since 9 l and 

02 are fixed, there are clearly bipolar if) and <j> that minimize this expression. 

We can derive a formula for a explicit lower bound as follows. Since x^ 3 ^ 1 and <f> t ^ 1, we have the 
inequality: 
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fnjn, ds dt 

^ f f M ds dt 

- Afl n,l I nj 

where M = max VE^. Therefore, 

?< 

£'( 0 , <?!>) = -ifr'bcf) — i(;'Q x + <p-0 2 

— A/J fljJ | n 2 | “h d“ d>*0o 

^ — a/| fijl | n 2 | + n 2 | “h ^2! ^il 

This is our lower bound. It is fixed since it depends only on $. ■ 

The preceding theorems show that the bidirectional associative field memory approaches one of the 
states characterizing a stored associative pair. 

5. CONCLUSIONS 

We have argued that AI is moving by necessity into a new phase that recognizes the role of nonproposi- 
tional knowledge in intelligent behavior. We also argued that the “new” AI must make use of massive 
parallelism to achieve its ends. We proposed a definition of massive parallelism, namely that the number 
of processing elements can be taken as a continuous quantity. We believe that this definition will 
encourage the development of the necessary theoretical basis for neurocomputers, optical computers, molec- 
ular computers, and a new generation of analog computers. We claimed that these computing technologies 
can be profitably viewed as field computers , computers that operate on entire fields of data in parallel. We 
discussed the importance of general purpose field computers, and related them to universal field computers. 
This was followed by a theoretical model of field computation, including the derivation of several generali- 
zations of Taylor’s theorem for field transformations. These theorems provide one theoretical basis for 
universal field computers. Finally, we illustrated our theory of field computation by analyzing a continu- 
ous field version of Kosko’s bidirectional associative memory 
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